1.0 Data set:-

data_file   <- "/Users/zzahir1978/Desktop/Sample data/en.sahih.txt"
##            Name Verse
##   1: Al-Fatihah     7
##   2: Al-Baqarah   286
##   3:  Ale Imran   200
##   4:   An-Nisa'   176
##   5: Al-Ma'idah   120
##  ---                 
## 110:    An-Nasr     3
## 111:   Al-Masad     5
## 112:  Al-Ikhlas     4
## 113:   Al-Falaq     5
## 114:     Al-Nas     6
##                  V1        V2
## 1:   Data Size (MB)      0.86
## 2:      Nos.Of Line   6249.00
## 3: Nos.Of Character 891800.00
## 4:     Nos.Of Words 158992.00

2.0 Compute sample sizes in terms of lines

##    data_size
## 1:    4374.3

3.0 Text Data Analysis Results

3.1 Most frequent and least frequent words

3.1.1 Top 10 most frequent words

##       word count
##  1:  allah  2065
##  2:   will  1664
##  3: indeed  1044
##  4:   lord   670
##  5:   said   567
##  6:    say   547
##  7: people   508
##  8:   upon   443
##  9: except   333
## 10:  among   333

3.1.2 Ten Least frequent words

##          word count
##  1:    losers    31
##  2: competent    31
##  3:    former    31
##  4:     thing    31
##  5:      eyes    31
##  6: criminals    32
##  7:     wrong    32
##  8:  grateful    32
##  9: mountains    32
## 10:     gives    32

3.1.3 Plotting 10 Most Frequent Words

3.1.4 Plotting 10 Least Frequent Words

3.1.5 Creating Words Cloud

